> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/Anny26022/chartsmaze_clone/llms.txt
> Use this file to discover all available pages before exploring further.

# Pipeline Settings

> Configure the EDL pipeline execution behavior with three key flags

## Overview

The EDL pipeline can be configured using three boolean flags at the top of `run_full_pipeline.py`. These settings control data fetching behavior, optional datasets, and cleanup operations.

## Configuration Flags

All configuration flags are located in `run_full_pipeline.py` at lines 61-71:

```python theme={null}
# ═══════════════════════════════════════════════════
# Configuration
# ═══════════════════════════════════════════════════

# OHLCV: Auto-detect mode
# True = always fetch (incremental update: ~2-5 min if data exists, ~30 min first time)
# False = skip entirely (ADR, RVOL, ATH, % from ATH fields will be 0)
FETCH_OHLCV = True

# Set to True to also fetch standalone data (Indices, ETFs)
FETCH_OPTIONAL = False

# Auto-delete intermediate files after pipeline succeeds
# Keeps: all_stocks_fundamental_analysis.json.gz + ohlcv_data/
CLEANUP_INTERMEDIATE = True
```

### FETCH\_OHLCV

<ParamField path="FETCH_OHLCV" type="boolean" default={true}>
  Controls whether to fetch historical OHLCV (Open, High, Low, Close, Volume) data for all stocks.
</ParamField>

**Behavior:**

* **`True`**: Fetches lifetime OHLCV data using smart incremental updates
  * First run: \~30 minutes (downloads full history from 1976)
  * Subsequent runs: \~2-5 minutes (only fetches new data)
  * Enables ADR, RVOL, ATH, and % from ATH calculations
* **`False`**: Skips OHLCV fetching entirely
  * Pipeline runs \~4 minutes faster
  * Fields that depend on OHLCV will show `0` or `null`:
    * `5/14/20/30 Days MA ADR(%)`
    * `RVOL`
    * `% from ATH`
    * `Returns since Earnings(%)`

**When to disable:**

* Testing pipeline changes without needing price data
* Running quick fundamental-only refreshes
* Network bandwidth constraints

**Files affected:**

* Creates/updates: `ohlcv_data/{SYMBOL}.csv` (one file per stock)
* Creates/updates: `indices_ohlcv_data/` directory for index data

### FETCH\_OPTIONAL

<ParamField path="FETCH_OPTIONAL" type="boolean" default={false}>
  Enables fetching of standalone datasets not included in the main pipeline output.
</ParamField>

**Behavior:**

* **`True`**: Runs PHASE 6 scripts to fetch:
  * All market indices (`all_indices_list.json`) - 194 indices
  * ETF data (`etf_data_response.json`) - 361 ETFs
* **`False`**: Skips PHASE 6 entirely

**What gets fetched:**

| Script                 | Output File              | Records | Description                            |
| ---------------------- | ------------------------ | ------- | -------------------------------------- |
| `fetch_all_indices.py` | `all_indices_list.json`  | 194     | Nifty 50, Bank Nifty, sectoral indices |
| `fetch_etf_data.py`    | `etf_data_response.json` | 361     | All exchange-traded funds              |

**Note:** These files are standalone and **not merged** into `all_stocks_fundamental_analysis.json.gz`. They're used separately by the frontend for index tracking and ETF screening.

**When to enable:**

* You need fresh index composition data
* Building ETF comparison features
* Running a full data refresh for all asset classes

### CLEANUP\_INTERMEDIATE

<ParamField path="CLEANUP_INTERMEDIATE" type="boolean" default={true}>
  Auto-deletes intermediate files after successful pipeline completion.
</ParamField>

**Behavior:**

* **`True`**: Removes all intermediate files and directories after compression
  * Keeps only: `*.json.gz` files + `ohlcv_data/` + `indices_ohlcv_data/`
  * Frees \~150-200 MB of disk space
* **`False`**: Preserves all intermediate files for debugging

**Files deleted when enabled:**

```python theme={null}
INTERMEDIATE_FILES = [
    "master_isin_map.json",
    "dhan_data_response.json",
    "fundamental_data.json",
    "advanced_indicator_data.json",
    "all_company_announcements.json",
    "upcoming_corporate_actions.json",
    "history_corporate_actions.json",
    "nse_asm_list.json",
    "nse_gsm_list.json",
    "bulk_block_deals.json",
    "upper_circuit_stocks.json",
    "lower_circuit_stocks.json",
    "incremental_price_bands.json",
    "complete_price_bands.json",
    "nse_equity_list.csv",
    "all_stocks_fundamental_analysis.json",  # Raw JSON (after .gz is made)
]

INTERMEDIATE_DIRS = [
    "company_filings/",
    "market_news/",
]
```

**When to disable:**

* Debugging pipeline failures
* Inspecting intermediate data quality
* Running custom analysis on raw outputs
* Developing new pipeline stages

## Modifying Configuration

<Steps>
  <Step title="Open the pipeline runner">
    Navigate to the EDL Pipeline directory:

    ```bash theme={null}
    cd "DO NOT DELETE EDL PIPELINE"
    ```
  </Step>

  <Step title="Edit run_full_pipeline.py">
    Open the file in your editor:

    ```bash theme={null}
    nano run_full_pipeline.py
    # or
    vim run_full_pipeline.py
    ```
  </Step>

  <Step title="Update the flags (lines 64-71)">
    Modify the values according to your needs:

    ```python theme={null}
    FETCH_OHLCV = True           # Set to False to skip OHLCV
    FETCH_OPTIONAL = True        # Set to True to fetch indices & ETFs
    CLEANUP_INTERMEDIATE = False # Set to False to keep intermediate files
    ```
  </Step>

  <Step title="Save and run the pipeline">
    ```bash theme={null}
    python3 run_full_pipeline.py
    ```
  </Step>
</Steps>

## Common Configuration Scenarios

### Quick Fundamental Refresh (No OHLCV)

```python theme={null}
FETCH_OHLCV = False
FETCH_OPTIONAL = False
CLEANUP_INTERMEDIATE = True
```

**Runtime:** \~4 minutes\
**Use case:** Testing, quick fundamental updates

### Full Production Refresh

```python theme={null}
FETCH_OHLCV = True
FETCH_OPTIONAL = True
CLEANUP_INTERMEDIATE = True
```

**Runtime:** \~35 minutes (first run), \~8 minutes (incremental)\
**Use case:** Daily automated refresh, complete data update

### Development/Debugging Mode

```python theme={null}
FETCH_OHLCV = True
FETCH_OPTIONAL = False
CLEANUP_INTERMEDIATE = False
```

**Runtime:** \~30 minutes (first run), \~6 minutes (incremental)\
**Use case:** Inspecting intermediate outputs, debugging pipeline stages

## Impact on Output Fields

When `FETCH_OHLCV = False`, the following fields in `all_stocks_fundamental_analysis.json.gz` will be `0` or `null`:

| Field                           | Default Value (No OHLCV) |
| ------------------------------- | ------------------------ |
| `5 Days MA ADR(%)`              | `0`                      |
| `14 Days MA ADR(%)`             | `0`                      |
| `20 Days MA ADR(%)`             | `0`                      |
| `30 Days MA ADR(%)`             | `0`                      |
| `RVOL`                          | `0`                      |
| `% from ATH`                    | `0`                      |
| `Returns since Earnings(%)`     | `0`                      |
| `Max Returns since Earnings(%)` | `0`                      |

All other fundamental, technical indicator, and news fields remain unaffected.
